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Extracting parameter constraints from cosmological observations requires accurate determination 
of the covariance matrix for use in the likelihood function. We show here that uncertainties in 
the elements of the covariance matrix propagate directly to increased uncertainties in cosmological 
parameters. When the covariance matrix is determined by simulations, the resulting variance of the 
each parameter increases by a factor of order 1 + Nt/N s where JVf, is the number of bands in the 
measurement and N s is the number of simulations. 



I. INTRODUCTION 

Upcoming galaxy surveys [IH3 aim to measure cosmo- 
logical parameters at the percent level. Achieving this 
lofty goal will require overcoming a number of well-known 
theoretical systematics: bias in translating the matter 
distribution to the galaxy distribution (HJ [7] , uncertain- 
ties in the predictions for the dark matter spectrum [El[9], 
baryonic contamination of the power spectrum in weak 
lensing [TO1 E] , outliers in photometric redshifts [12] , ac- 
curate predictions of the halo mass function [13] . and 
many others. 

All of these are tied to making accurate predictions 
for the cosmological observable, be it cluster abundance, 
weak lensing power spectrum, or the position of the Bary- 
onic Acoustic Oscillation peaks. Here we focus on the 
effect of uncertainty not in the observable but in the co- 
variance matrix of the observable, an essential ingredient 
in transforming the predictions and observations into pa- 
rameter constraints. For simplicity throughout, we focus 
on the case when the likelihood is Gaussian so parameter 
constraints are obtained by minimizing 

N b 
X 2 ( P ) = V (xf - Xi (p)) Cr/ (^ - Xj (p)) (i) 



case when the covariance matrix is estimated from sim- 
ulations and dub the additional uncertainty simulation 
covariance error. Simulation covariance error is straight- 
forward to compute when the measurements x are Gaus- 
sian distributed, the dependence of the covariance on 
cosmology is neglected, and the sample covariance es- 
timator is used. Then, the simulation covariance error 
enhances the variance of every parameter by a factor of 
order (1 + Nb/N s ) with N s the number of simulations 
used for the estimate. We go beyond the Gaussian case 
with the example of the weak lensing power spectrum, 
where we use existing simulations to compute the simu- 
lation covariance error. The degradation is very similar 
to the Gaussian case. We conclude by tabulating the 
simulation covariance error for existing surveys. 



II. SIMPLE EXAMPLE 

Suppose the set of measurements xf each is designed 
to measure a single parameter x, and consider the case 
when the covariance matrix is diagonal, so Cy = 5ijO~f. 
Then, the inverse of the covariance matrix ^ = C _1 is 
also diagonal with elements ^ = cr~ 2 . In this simple 
case, we need to minimize 
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where p is the set of parameters; xf is the data collected 
in Nb bands (for example, the power spectrum of weak 
lensing at various multipole moments and redshifts or the 
cluster abundance in mass and redshift bins); Xi(p) is the 
set of predictions for these measurements which depend 
on the parameters; and C is the covariance matrix. We 
assume here that C is independent of p and therefore do 
not include the In \C\ normalization term in Eq. |l]). 

In this language, most of the work about systematics 
to date has been directed at obtaining accurate predic- 
tions for the Xi(p), while here we focus on the effect of 
mis-estimating the covariance matrix C. Previous work 
on covariance errors focused on the bias in the inverse co- 
variance estimate [14] and uncertainties in parameter er- 
rors |15) . Here we derive an expression for the additional 
variance of estimators of parameters due to the uncer- 
tainties in the covariance matrix. We then focus on the 



X 2 (x)=Y(xt-x) 2 * l 






in so doing, we arrive at an estimate for x: 



(2) 



(3) 



The uncertainty on this estimate can be obtained by com- 
puting ((x — x) 2 ), which leads to 



Ax 2 = 



Ei**] a 



J/ -x 2 . 



(4) 



The angular brackets around xfxf refer to an average 



over the distribution from which the xf are drawn. This 
distribution is assumed to be Gaussian with mean x and 



variance C*, where ' indicates this is the true variance, 
not necessarily equal to the covariance C (or its inverse 
^>) used to estimate x. Therefore, the variance of our 
estimator is 



Ax 2 = 



2_^CM 



Ei*< 



(5) 



If we had access to the true covariance matrix, then Cj^i 
would be equal to unity and the sum in the numerator 
would be simply equal to that in the denominator, leaving 
the variance on our estimator to be Ax 2 — l/X^^i 
which, in the limit of equal errors on each of the Nf, 
measurements, reduces to the standard a 2 /N b . 

Let's consider though the impact of not knowing ex- 
actly what the covariance matrix is. Write 



$i = W + A* 2 



Then the error on x is 
Ax 2 = — 



E^+A*,-) 



2 Z_^ 



Cj [*J + A*i 



(6) 



(7) 



Taylor expanding leads to 
1 



Ax 1 



new terms. 



(8) 



The first set of these new terms are linear in A 'I'. These 
lead to fluctuations in the error, meaning that the error 
we assign to our estimator will be wrong [15] . However, 
Av? is just as likely to fluctuate up as it is down, so the 
linear terms do not lead to a systematic bias on the er- 
ror, only an uncertainty on the error. The second set of 
terms is quadratic in A\l/, and this set is more pernicious 
as it leads to a larger error in the estimator of x. That 
is, the estimated value of x will be drawn from a distri- 
bution with a systematically larger variance than if the 
covariance matrix were known exactly. 

Let's compute this error in our simple model. The 
second order terms are 



Ax" 



second order 






(9) 



Suppose the fluctuations in the covariance matrix are 
such that [IS] 

(A* 2 A^) = ccSyVl (10) 

Then, the first term in Eq. (JsJ) will be of order N b ~ 2 . The 
second on the other hand is of order N^ 1 so it dominates 
and we are left with 



Ax 2 = 



l + a 



(11) 



If the uncertainty in the covariance matrix is driven by 
a finite number of simulations iV, , then we will see that 



a ~ 1/N S . We call the new term simulation covariance 
error, and it simply increases the errors on our estimate 
of x. Although one can drive this error down by run- 
ning many simulations, the number of (expensive) simu- 
lations required in the era of percent level measurements 
is apparently greater than a hundred, difficult but man- 
ageable. Unfortunately, this very simple case of diagonal 
errors does not capture the full danger of the situation. 
In the more realistic case that the covariance matrix is 
not diagonal, a scales as Nb/N s , so if there are measure- 
ments in a large number of bands, it will become harder 
and harder to reduce the covariance error. 



III. COVARIANCE ERROR IN THE GENERAL 
CASE 

We now generalize this treatment in three ways: First, 
we allow the covariance matrix to have off-diagonal ele- 
ments, so ^ij = C'j 1 is no longer just a diagonal matrix. 
Second, we allow for more than one parameter; instead of 
x, we envision fitting for a full set of parameters, p a . Fi- 
nally, the measurements are likely not direct estimates of 
the parameters. If we call the data in JV& bands xf , then 
we want to extract values of the cosmological parameters 
p a from these measurements. The theoretical predictions 
for these measurements, call them Xi depend on the pa- 
rameters: Xi — Xiipa), usually in some complicated way. 
For simplicity, we shift all parameters so the true values 
are equal to 0. Then the predictions Xi(p = 0) are equal 
to the true values x\. The measured values will not be 
exactly equal to x , but we expect the mean over many 
realizations to equal to the true set: 



and the spread is given by the covariance matrix 



C 



(Otf-sDOr? -a*)) 



(12) 



(13) 



where again superscript ' denotes the true value. We will 
extract the best fit values of the parameters by minimiz- 
ing Eq. (fTl). Note again that the covariance matrix here 
is not equal to the true one; this is the effect we want to 
explore: what happens to our parameter extraction when 
the covariance matrix is wrong? 

Let's decompose the % 2 into two pieces: 



x 2 Cp) = XoCp) + Ax 2 Cp) 



(14) 



where 



X 2 = 5>? - x t (p))(C%\x* - x 3 {p)) (15) 



and the term due to the uncertainty in the covariance 
matrix is 

A X 2 = J2(4 ~ x t (p))^M - *i(P)) ( 16 ) 



where 



At^C: 1 -^ 1 . 



(17) 



Both Xo an d A;^ 2 are functions of p, and we can Taylor 
expand both around p = 0. Apart from an irrelevant 
constant, the standard piece is 



*■-- ^'^C^ 1 ^ 



xob) ^ -2 



ij 



dp a 



Xj)p a + F a pp a pp (18) 



where 



F a p 



1 ^ 



2 dp a dpp 
dxi 

OPa 
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(19) 



The approximate equality on the second line follows since 
operating with the derivative twice on x l leaves a factor 



of. 



which averages to zero. Before turning to the 



effects of the new piece, it is worth recalling the derivation 
for the mean and variance of the estimator for p a using 
the standard terms. Minimizing the Taylor expanded Xo 
with respect to p a leads to the estimator 



ij r 



H*$ 



x% 



(20) 



Since ((x^ — Xj)) = 0, the mean of this estimator is zero, 
equal to the true value, so the estimator is unbiased. The 
expected variance is obtained by squaring Eq. ( 20 ) and 
using the fact that ((x d 
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(21) 



where the second equality follows from recognizing the 
sum over i,j as the definition of F and then setting 
F~ l F — I. So F^ 1 is the projected covariance matrix 
on the parameters if C is known exactly. 

To account for the effect of the uncertainty in the co- 
variance matrix, we now Taylor expand A% 2 in Eq. (14): 



Ax 2 



with 



-2£2r A *«(*? 



dp a 



x\)Va 



AF a0PaPfi (22) 



AR 



a/3 



Ef^ 

4^ dp a 



dx j 
dpp' 



(23) 



The changes to x 2 translate into a new estimator for the 
parameters: 



p a = [F + AF]-£ 



dxj 

dp a ' 



[*' 



**M*? 



(24) 



Just as in the toy model of SJTTJ we can expand this es- 
timator in powers of A\&, and - subject to the caveats 
mentioned below - the estimator will remain unbiased 
but its variance will increase. 

Although we are interested in the terms second order 
in A^ as these lead to larger errors on the parameters, 
it is worth pausing to comment here on two situations 
where the linear terms could lead to a bias: (i) when the 
covariance matrix depends on the parameters and this 
dependence is ignored by fixing C and (ii) when the fluc- 
tuations in A\& are correlated with fluctuations in the 
data. To illustrate consider the simple situation where 
the elements of the inverse covariance matrix are mono- 
tonically decreasing functions of p (e.g., in the diagonal 
case, when p is the amplitude, the cosmic variance will be 
larger when p increases and therefore elements of the in- 
verse covariance matrix will be smaller when p is greater 
than zero). Then, the assumed fixed value of ^f will be 
less than the true value when p < and greater than the 
true value when p > 0; equivalently A "J will start neg- 
ative and turn positive as p passes through zero. If the 
fluctuations in A^/ are uncorrelated with fluctuations in 



the data, then the first term in Eq. ( 22 ) has mean zero. 



The second will be negative when p < and positive 
when p > 0. This will then mistakenly favor regions of 
parameter space with p < 0. A full understanding of the 
bias induced by neglecting the parameter dependence of 
the covariance matrix is beyond the scope of this paper 
(in particular, the determinant in the prefactor of the 
likelihood also needs to be considered) jTHj , but this sim- 
ple example makes some of the dangers explicit. The 
second potential bias occurs when (A^(a; d — x*)) is non- 
zero. This happens most obviously when the data itself 
is used to generate the covariance matrix. In that case, 
upwards fluctuations in the data would lead to down- 
wards fluctuations in A^, so - taking into account the 
overall minus sign - the coefficient of the linear term in 
Eq. (22) would be positive. This change will increase 
the estimated value of p. If the fluctuation in the data 
was negative, there would be a positive fluctuation in 
AvP, again leading to a positive linear coefficient in A% 2 . 
Again, the bias would push to larger values of p. The 
conclusion is that a correlation between the data and the 
covariance matrix may induce a parameter bias. In the 
simple case where the fluctuations in the covariance ma- 
trix are positive correlated with fluctuations in the data 
and the derivative with respect to the parameters are also 
monotonically increasing, the parameters will be biased 
high. 

We now isolate terms quadratic in A 1 ?, as these lead 
to larger errors in the estimator: 



(PaPp) 



= F' 



[F 



dx 

dp 
- 1 AFF _1 AFF- 1 



J^,(A^(A*W 



a/3 ' 



F 



P'P 



(25) 



Here the angular brackets denote the expectation over 
the random values of x d drawn from the Gaussian dis- 
tribution with mean x{p = 0) and variance C . We have 



not (yet) computed the expectation of the fluctuations 
in , 3>. Note that this expression reduces to Eq. d9l) in the 
1-parameter, diagonal covariance matrix case. To com- 
plete the calculation, we need an expression for variance 
of the fluctuations in A*$>. Let us write these generically 
as 

(26) 



Inserting this expression into Eq. ( 25 ) leads to 



(PaP/s) 



BF~UN b -N p ) 



(27) 



where N p is the number of parameters in the fit and has 
the restriction, N p < N b . Eq. (27) is our main result, 



demonstrating that uncertainty in the covariance matrix 
propagates directly to a new source of uncertainty in the 
estimate of parameters. This uncertainty is proportional 
to F~g , which is equal to the parameter covariance in the 
absence of this additional error. So covariance error does 
not alter the shape of the constraints, but does inevitably 
lead to looser constraints. 

A simple way to think of this degradation is to re- 
call that the parameter covariance matrix is inversely 
proportional to /sky, the fraction of sky covered by a 
survey. Covariance error enters in an identical way, so 
if the new variance captured in Eq. (27) has coefficient 
B(N b — N p ) equal to 0.1, for example, the result is equiv- 
alent to throwing away 10% of the data set. 
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FIG. 1: The coefficient B from Eq. (261 as a function of the 



number of simulation realizations using the lensing two-point 
correlation function ('corr') and power spectrum ('power') 
from |18l I19| (with a delta-function source distribution at 
2 = 1). There are 1000 realizations of the simulated two- 
point functions. We take the sample covariance using all 1000 
simulations as a reference, and compare with the sample co- 
variance using subsets of N s simula tion s. The dashed black 
line shows the prediction from Eq. (28 1 with N b = 24. The 



error bars indicate the standard error on B as a regression 
coefficient fit simultaneously to the N b x N{? components of 
Cov(A*). 



A. Gaussian limit 

Taylor et al. [15 computed the values of A and B in 
the Gaussian case (after correcting for the bias in the 
inverse covariance estimator |14j): 



A 

B 



(N s -N b -l)(N s -N b -4) 

N s -N b -2 
(N s -N b -l)(N s -N b -4) 



(28) 



As in the toy model of qTTJ in the (common) limit that 
N s 3> N b ^> N p , the variance is enhanced over the stan- 
dard variance by a factor of (1 + N b /N s ). This is our 
main conclusion. 



resulting estimate of B is shown in Figure [T] compared 
with the Gaussian prediction. It is seen that, even for 
this highly non-gaussian field, Eq. (28) gives a good fit 
to the simulation samples. There are two reasons one 
might expect B to exhibit a different dependence on N s , 
1) the two-point function of a Gaussian random field is 
not itself Gaussian distributed, 2) nonlinear gravitational 
evolution skews the statistics of the cosmological mass 
density field away from Gaussian. However, because the 
two-point function estimator is a sum of squares of the 
density perturbations, the distribution of the estimator 
may tend to a Gaussian as the number of modes in a 
(wavenumber or angular) bin becomes large. Figure [I] is 
consistent with this explanation. 



B. Weak Lensing Spectra 



We can compute simulation covariance error for non- 
guassian fields by using a subset of available simulations. 
As an example, we use the suite of weak lensing simu- 
lations from [T51 Qj|] , assuming that the true covariance 
matrix is obtained from the scatter in all the simulations 
(1000 total). Then using only some of the simulations, 
we estimate A 1 ? and therefore B by taking the difference 
in ^ from the smaller and full set of simulations. The 



C. Current surveys 

Table U demonstrates the effect of simulation covari- 
ance error for some recently published cosmological sur- 
veys (which estimated covariance matrices from simula- 
tion realizations rather than from the data). We find 
that the degradation ranges from 5-15%. 



TABLE I: Increase in the variance of each parameter due 
to simulation covariance errors for some recently published 
survey analyses. 



Survey 




N s 


N b 


Fractional Increase 
in Variance 


BOSS [2D] 




600 


41 


7% 


DLS [21] 




1000 


60 


6% 


CHFTLens 


[22] 


184 


24 


13% 



IV. CONCLUSIONS 

We derived a new contribution to parameter uncer- 
tainties from the uncertainty in sample data covariance 
matrices estimated from simulations. This error adds in 
quadrature with other sources of parameter uncertainty 
and scales with the ratio of the number of data bins to 
the number of simulation realizations. 

Current surveys use hundreds of simulations, but even 
this large number leads to an underestimate of parameter 
uncertainties by ~5-15%. Future surveys, which will be 
sensitive enough to measure in hundreds of bins will re- 



quire of order 10 4 simulation realizations (per cosmolog- 
ical model) to prevent 5-10% degradation in the param- 
eter uncertainties. Mitigation schemes such as shrink- 
age estimators [25] , emulators [THl Hi], and large-scale 
mode-resampling [35] will be important to reduce these 
computational requirements to tractable levels. 

When the covariance matrix varies with cosmology (as 
is generally the case), there will be additional contribu- 
tions to the simulation covariance error. We will derive 
these contributions in future work, but expect them to 
be sub-dominant to the primary result we present in this 
paper as long as the model for the cosmology-dependent 
covariance is accurate enough to ensure that the fluctua- 
tions, A\P, in the covariance estimator are approximately 
independent of cosmology. 
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